slimTrain---A Stochastic Approximation Method for Training Separable Deep Neural Networks

نویسندگان

چکیده

Deep neural networks (DNNs) have shown their success as high-dimensional function approximators in many applications; however, training DNNs can be challenging general. DNN is commonly phrased a stochastic optimization problem whose challenges include nonconvexity, nonsmoothness, insufficient regularization, and complicated data distributions. Hence, the performance of on given task depends crucially tuning hyperparameters, especially learning rates regularization parameters. In absence theoretical guidelines or prior experience similar tasks, this requires solving series repeated problems which time-consuming demanding computational resources. This limit applicability to with nonstandard, complex, scarce datasets, e.g., those arising scientific applications. To remedy training, we propose \tt slimTrain, method for reduced sensitivity choice hyperparameters fast initial convergence. The central idea slimTrain exploit separability inherent architectures; that is, separate into nonlinear feature extractor followed by linear model. allows us leverage recent advances made large-scale, linear, ill-posed inverse problems. Crucially, weights, does not require rate automatically adapts parameter. our numerical experiments using approximation tasks surrogate modeling dimensionality reduction, outperforms existing methods recommended hyperparameter settings reduces remaining hyperparameters. Since operates mini-batches, its overhead per iteration modest savings realized reducing number iterations (due quicker convergence) need solved identify effective

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Provable approximation properties for deep neural networks

We discuss approximation of functions using deep neural nets. Given a function f on a d-dimensional manifold Γ ⊂ R, we construct a sparsely-connected depth-4 neural network and bound its error in approximating f . The size of the network depends on dimension and curvature of the manifold Γ, the complexity of f , in terms of its wavelet description, and only weakly on the ambient dimension m. Es...

متن کامل

Why Deep Neural Networks for Function Approximation?

Recently there has been much interest in understanding why deep neural networks are preferred to shallow networks. We show that, for a large class of piecewise smooth functions, the number of neurons needed by a shallow network to approximate a function is exponentially larger than the corresponding number of neurons needed by a deep network for a given degree of function approximation. First, ...

متن کامل

Adaptive dropout for training deep neural networks

Recently, it was shown that deep neural networks can perform very well if the activities of hidden units are regularized during learning, e.g, by randomly dropping out 50% of their activities. We describe a method called ‘standout’ in which a binary belief network is overlaid on a neural network and is used to regularize of its hidden units by selectively setting activities to zero. This ‘adapt...

متن کامل

Exploring Strategies for Training Deep Neural Networks

Deep multi-layer neural networks have many levels of non-linearities allowing them to compactly represent highly non-linear and highly-varying functions. However, until recently it was not clear how to train such deep networks, since gradient-based optimization starting from random initialization often appears to get stuck in poor solutions. Hinton et al. recently proposed a greedy layer-wise u...

متن کامل

A conjugate gradient based method for Decision Neural Network training

Decision Neural Network is a new approach for solving multi-objective decision-making problems based on artificial neural networks. Using inaccurate evaluation data, network training has improved and the number of educational data sets has decreased. The available training method is based on the gradient decent method (BP). One of its limitations is related to its convergence speed. Therefore,...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: SIAM Journal on Scientific Computing

سال: 2022

ISSN: ['1095-7197', '1064-8275']

DOI: https://doi.org/10.1137/21m1452512